SQL Server 2008 Analysis Services : Understanding the SSAS Environment Wizards (part 1)

12/12/2010 3:26:03 PM

Welcome to the “land of wizards.” This implementation of SSAS, as with older versions of SSAS, is heavily wizard oriented. SSAS has a Cube Wizard, a Dimension Wizard, a Partition Wizard, a Storage Design Wizard, a Usage Analysis Wizard, a Usage-Based Optimization Wizard, an Aggregation Wizard, a Calculated Cells Wizard, a Mining Model Wizard, and a few other wizards. All of them are useful, and many of their capabilities are also available through editors and designers. Using a wizard is helpful for those who need to have a little structure in the definition process and who want to rely on defaults for much of what they need. The wizards are also plug-and-play oriented and have been made available in all SQL Server and .NET development environments. In other words, you can access these wizards from wherever you need to, when you need to. All the wizard-based capabilities can also be coded in MDX, DMX, and ASSL.

Figure 1 shows how SSAS fits into the overall scheme of the SQL Server 2008 environment. SSAS has become completely integrated into the SQL Server platform. Utilizing many different mechanisms, such as SSIS and direct data source access capabilities, a vast amount of data can be funneled into the SSAS environment. Most of the cubes you build will likely be read-only because they will be for BI. However, a write-enabled capability (WriteBack) is available in SSAS for situations that meet certain data updatability requirements.

Figure 1. SSAS as part of the overall SQL Server 2008 environment.

As you can also see in Figure 1 , the basic components in SSAS are all focused on building and managing data cubes. SSAS consists of the analysis server, processing services, integration services, and a number of data providers. SSAS has both server-based and client-/local-based SSAS capabilities. This essentially provides a complete platform for OLAP.

You create cubes by preprocessing aggregations (that is, precalculated summary data) that reflect the desired levels within dimensions and support the type of querying that will be done. These aggregations provide the mechanism for rapid and uniform response times to queries. You create them before the user uses the cube. All queries utilize either these aggregations, the cube’s source data, a copy of this data in a client cube, data in cache, or a combination of these sources. A single Analysis Server can manage many cubes. You can have multiple SSAS instances on a single machine.

By orienting around UDM, SSAS allows for the definition of a cube that contains data measures and dimensions. Each cube dimension can contain a hierarchy of levels to specify the natural categorical breakdown that users need to drill down into for more details.

The data values within a cube are represented by measures (the facts). Each measure of data might utilize different aggregation options, depending on the type of data. Unit data might require the SUM (summarization) function, Date of Receipt data might require the MAX function, and so on. Members of a dimension are the actual level values, such as the particular product number, the particular month, and the particular country. Microsoft has solved most of the limitations within SSAS. SSAS addresses up to 2,147,483,647 of most anything within its environment (for example, dimensions in a database, attributes in a dimension, databases in an instance, levels in a hierarchy, cubes in a database, measures in a cube). In reality, you will probably not have more than a handful of dimensions. Remember that dimensions are the paths to the interesting facts. Dimension members should be textual and are used as criteria for queries and as row and column headers in query results.

Every cube has a schema from which the cube draws its source data. The central table in a schema is the fact table that yields the cube’s data measures. The other tables in the schema are the dimension tables that are the source of the cube dimensions. A classic star-schema data warehouse design has this central fact table along with multiple dimension tables. This is a great starting point for OLAP cube creation, as you can see in Figure 2 . Here, we show you a high-tech company’s computer sales star-schema data warehouse that can be used as the source of building up an OLAP cube within SSAS.

Figure 2. A star-schema data warehouse design with a central fact table and multiple dimensions of these facts as the source for an OLAP cube in SSAS.

SSAS allows you to build dimensions and cubes from heterogeneous data sources. It can access relational OLTP databases, multidimensional data databases, text data, and any other source that has an OLE DB provider available. You don’t have to move all your data first; you just connect to its source. In SSAS, you can also design OLAP cubes from scratch. Then you can have SSAS create the relational schema of tables in SQL Server that you want to populate with the transactional data that will drive the OLAP cube.

Essentially, cubes can be regular or local cubes. Regular cubes are based on real tables as the data source, have aggregations, and occupy physical storage space of some kind. If a data source that contributes to this cube changes, the cube must be reprocessed. Figure 3 shows this cube representation and that it consists of something called partitions.

Figure 3. The SSAS cube representations: regular OLAP cubes and partitions.

Local cubes are entirely contained in portable SSAS files (that is, tables) and can be browsed without a connection to an SSAS instance. This is really like being in “disconnected” mode.

Write-enabled dimensions within a cube enable updates (that is, writes) of data that can be shared back (that is, written back) with the data sources.

Following is a quick summary of all the essential cube terms in SSAS:

Database— A database is a logical container of one or more cubes. Cubes are defined within Analysis Server databases.
Cube— A cube is a multidimensional representation of the business facts. Types of cubes are regular and local.
Data source— The data source is the origin of a cube’s data.
Measure group— This group is a collection (or grouping) of one or more measures into some type of logical unit for business purposes. A measure group does not occupy any physical space. It is metadata only.
Measure— A measure is a data fact representation. A measure is typically a data value fact, such as price, unit, or quantity.
Cell— A cell is the part of a data measure that is at the intersection of the dimensions. The cell contains the data value. If an intersection (that is, cell) has no value yet, it does not physically exist until it is populated.
Dimension— A cube’s dimension is defined by the aggregation levels of the data that are needed to support the data requirements. A dimension can be shared with other cubes, or it can be private to a cube. The structure of a dimension is directly related to the dimension table columns, member properties, or structure of OLAP data mining models. This structure becomes the hierarchy and should be organized accordingly. You can also have strict parent/child dimensions in which two columns are identified as being parent and child and the dimension is organized according to them. In a regular dimension, each column in the dimension contributes a hierarchy level.
Level— A level includes the nodes of the hierarchy or data mining model. Each level contains the members. Millions of members are possible for each level.
Partition— One or more partitions comprise a cube. Using a partition is a way to physically separate parts of a cube. This separation essentially lets you deal with individual slices of a data cube separately, querying only the relevant data sources. If you partition by dimension, you can perform incremental updates to change that dimension independently of the rest of the cube. Consequently, you have to reprocess only the aggregations that are affected by those changes. This is an excellent feature for scalability.
Hierarchy— A hierarchy is a set of members in a dimension and their position relative to each other. Hierarchies can either be balanced or unbalanced. Being balanced simply means that all branches of the hierarchy descend to the same level. An unbalanced hierarchy allows for branches to descend to different levels. It is also possible to define more than one hierarchy for a single dimension. A great example of this is “fiscal calendar time” and “Gregorian calendar time” being defined in one dimension—a Time dimension that contains both time.gregorian and time.fiscal.